Remove `nvidia-mathdx` dependency #2295

ksivaman · 2025-10-23T01:30:24Z

Description

Remove nvidia-mathdx dependency.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Remove nvidia-mathdx dependency.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

greptile-apps

Greptile Overview

Greptile Summary

This PR removes the nvidia-mathdx dependency to fix cross-platform build issues where x86-specific binaries were being included on non-x86 architectures. Since TransformerEngine only uses the Philox4x32 pseudo-random number generator from mathdx for stochastic rounding in FP4 quantization kernels, the solution re-implements this limited functionality directly in transformer_engine/common/util/curanddx.hpp. The changes update build configurations (CMakeLists.txt, pyproject.toml, CI workflows), modify four CUDA kernel files to use the internal implementation, and maintain backward compatibility by preserving the same Philox4_32-10 algorithm behavior.

Potential Issues

Critical - Integer overflow handling in PRNG state increment

The new curanddx.hpp implementation has potential correctness issues in counter increment logic:

Line 66 in philox_state_incr: The condition if (nhi <= ctr.y) appears incorrect for overflow detection. After ctr.y += nhi, overflow occurs when ctr.y < previous_value, but this check compares nhi with the new ctr.y value, which may incorrectly skip carry propagation to higher words. This could truncate the effective counter space from 128 bits to 64 bits.
Lines 72-80 in philox_state_incr_hi: No overflow handling when ctr.w overflows after ctr.w += nhi. This could cause state collisions in parallel workloads with many subsequences.

For reference, CUDA's cuRAND Philox implementation uses explicit overflow tracking (e.g., if (new_val < old_val) carry = 1;). These bugs could manifest as duplicate random sequences in distributed training or multi-GPU scenarios where subsequence skipping is used.

Recommendation: Compare against the reference cuRAND Philox implementation or the original mathdx code to verify the carry logic is correct.

Minor - Missing test coverage

The PR description indicates no tests were added ("I have added tests that prove my fix is effective" is unchecked). Given this changes random number generation for stochastic rounding, numerical validation tests comparing outputs with the old mathdx implementation would be valuable to ensure identical behavior.

Minor - Documentation gap

The PR description notes "I have commented my code, particularly in hard-to-understand areas" is unchecked. The new curanddx.hpp has minimal inline documentation explaining the Philox algorithm, designated initializer usage, or the overflow handling strategy.

Confidence

2 out of 5 - While the overall approach is sound (vendoring a small dependency to fix portability issues), the potential overflow bugs in the PRNG implementation create significant risk for numerical correctness in production workloads. The lack of test coverage exacerbates this concern. The rest of the changes (build system, include paths, API migration) are straightforward and low-risk.

_{8 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-10-23T01:31:31Z

transformer_engine/common/util/curanddx.hpp

+  __forceinline__ __device__ void philox_state_incr(size_t n) {
+    unsigned int nlo = (unsigned int)(n);
+    unsigned int nhi = (unsigned int)(n >> 32);
+
+    ctr.x += nlo;
+    if (ctr.x < nlo) nhi++;
+
+    ctr.y += nhi;
+    if (nhi <= ctr.y) return;
+    if (++ctr.z) return;
+    ++ctr.w;
+  }


logic: overflow logic has subtle carry propagation; the increment chain ctr.x += nlo; if (ctr.x < nlo) nhi++; ctr.y += nhi; if (nhi <= ctr.y) return; stops early if nhi <= ctr.y, but this skips carrying into ctr.z/ctr.w when ctr.y wraps. If ctr.y overflows after adding nhi, the carry is lost. should the condition on line 67 be if (nhi > ctr.y) (overflow occurred) instead of if (nhi <= ctr.y) return (no overflow)?

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

greptile-apps

Greptile Overview

Greptile Summary

This review covers only the changes made since the last review, not the entire PR. The developer has not added any new commits after the previous feedback regarding the carry propagation bug in curanddx.hpp. No incremental changes were detected beyond the initial PR submission. The previous review correctly identified a critical carry logic issue in philox_state_incr (line 67 of curanddx.hpp) that remains unaddressed: the condition if (nhi <= ctr.y) return; will fail to propagate overflow carries from ctr.y into ctr.z and ctr.w, potentially breaking the PRNG sequence. Since no new code has been added, this summary only confirms that the previously identified issue persists and should be resolved before merging.

PR Description Notes:

The checklist indicates "I have commented my code, particularly in hard-to-understand areas" is unchecked, which aligns with the observation that the custom Philox implementation in curanddx.hpp lacks inline comments explaining the subtle carry logic.
"I have added tests that prove my fix is effective or that my feature works" is also unchecked. Given that this PR replaces a well-tested external library with a custom implementation containing a potential carry bug, adding unit tests for the Philox RNG state increment and generate functions would be prudent.
"New and existing unit tests pass locally with my changes" is unchecked, raising concerns about whether the incorrect carry logic in curanddx.hpp has been validated against the previous nvidia-mathdx behavior.

_{8 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

ksivaman · 2025-10-23T01:48:36Z

/te-ci

dorispnvidia

LGTM

dorispnvidia · 2025-10-23T01:51:50Z

transformer_engine/common/transpose/quantize_transpose_vector_blockwise_fp4.cu


-__device__ __forceinline__ uint32_t get_rbits(RNG& rng, uint4& random_uint4, int& rnd_idx) {
+__device__ __forceinline__ uint32_t
+get_rbits(transformer_engine::curanddx::detail::philox4x32_native_state<10>& rng,


Could add a comment somewhere that philox4x32_native_state<10> means 10 rounds of philox4_32. The template parameter is the round number.

ptrendx · 2025-10-23T05:38:01Z

@dorispnvidia A question - putting this implementation here in this way will make it Apache licensed. Is that ok? What is the original license of this file? We could (and probably should) change the license for this specific file to match whatever license is in the original.

ptrendx

Blocking temporarily to make sure we put the right license on the philox implementation.

dorispnvidia · 2025-10-23T13:49:29Z

@dorispnvidia A question - putting this implementation here in this way will make it Apache licensed. Is that ok? What is the original license of this file? We could (and probably should) change the license for this specific file to match whatever license is in the original.

I will ask Lukasz about it.

llukas · 2025-10-24T22:55:22Z

Any code we'd develop for direct inclusion into TE should use TE license. Apache 2.0 is ok, code we contribute is only philox implementation we skipped MathDX API or any other pieces that we'd like to protect.

greptile-apps

Greptile Overview

Greptile Summary

This review covers only the most recent changes made since the last review, not the entire PR. The latest update addresses the carry propagation logic in curanddx.hpp identified in the previous review. The condition on line 67 has been corrected from if (nhi <= ctr.y) return; to properly detect overflow when incrementing the counter chain. This fix ensures that carries correctly propagate through all four 32-bit words of the Philox counter state (ctr.x, ctr.y, ctr.z, ctr.w) during the philox_state_incr operation, which is critical for the correctness of stochastic rounding in FP4/FP8 quantization kernels that depend on this PRNG.

Important Files Changed

Filename	Score	Overview
transformer_engine/common/util/curanddx.hpp	5/5	Fixed carry propagation overflow detection logic in counter increment

Confidence score: 5/5

This PR is safe to merge with minimal risk
The fix correctly addresses the carry propagation bug identified in the previous review by changing the overflow detection condition; the corrected logic now properly propagates carries through the full 128-bit counter
No files require special attention; the single-line fix is straightforward and resolves a critical correctness issue

_{8 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

greptile-apps

Greptile Overview

Greptile Summary

This review covers only the changes made since the last review, not the entire PR. The developer added inline comments to document the philox4x32_native_state<10> template parameter (clarifying that "10" refers to 10 rounds of the Philox 4x32 PRNG) in two CUDA kernel files. This addresses the first previous review comment requesting documentation for this type. The formatting adjustments split long function signatures across multiple lines to comply with the repository's 100-character column limit (enforced by .clang-format).

Important Files Changed

Filename	Score	Overview
transformer_engine/common/util/nvfp4_transpose.cuh	5/5	Added inline comment documenting `philox4x32_native_state<10>` and reformatted function signature to fit column limit
transformer_engine/common/transpose/quantize_transpose_vector_blockwise_fp4.cu	5/5	Added inline comment for `philox4x32_native_state<10>` and split function signature across lines for readability

Confidence score: 5/5

This PR is safe to merge with minimal risk
These are documentation-only and formatting changes with no functional modifications to the logic; the inline comments improve code clarity after removing the nvidia-mathdx dependency, and the formatting adheres to project standards
No files require special attention

_{2 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

ksivaman · 2025-10-27T17:12:25Z

/te-ci

Remove nvidia-mathdx dep

5b9da01

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

ksivaman requested a review from ptrendx October 23, 2025 01:30

Merge branch 'main' into rm_mathdx_dependency

9d7757b

greptile-apps bot reviewed Oct 23, 2025

View reviewed changes

Fix SR

8906493

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

ksivaman marked this pull request as draft October 23, 2025 01:44

ksivaman marked this pull request as ready for review October 23, 2025 01:47

greptile-apps bot reviewed Oct 23, 2025

View reviewed changes

dorispnvidia approved these changes Oct 23, 2025

View reviewed changes

ptrendx requested changes Oct 23, 2025

View reviewed changes

Merge branch 'main' into rm_mathdx_dependency

b3de840

greptile-apps bot reviewed Oct 27, 2025

View reviewed changes

Add comment

1cb3c36

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

greptile-apps bot reviewed Oct 27, 2025

View reviewed changes

ptrendx approved these changes Oct 27, 2025

View reviewed changes

ksivaman merged commit d7c9777 into NVIDIA:main Oct 27, 2025
37 of 41 checks passed

ksivaman deleted the rm_mathdx_dependency branch October 27, 2025 22:13

Remove nvidia-mathdx dependency #2295

Remove nvidia-mathdx dependency #2295

Uh oh!

Conversation

ksivaman commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Potential Issues

Confidence

Uh oh!

greptile-apps bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Uh oh!

ksivaman commented Oct 23, 2025

Uh oh!

dorispnvidia left a comment

Choose a reason for hiding this comment

Uh oh!

dorispnvidia Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

ptrendx commented Oct 23, 2025

Uh oh!

ptrendx left a comment

Choose a reason for hiding this comment

Uh oh!

dorispnvidia commented Oct 23, 2025

Uh oh!

llukas commented Oct 24, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Important Files Changed

Confidence score: 5/5

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Important Files Changed

Confidence score: 5/5

Uh oh!

ksivaman commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Remove `nvidia-mathdx` dependency #2295

Remove `nvidia-mathdx` dependency #2295

ksivaman commented Oct 23, 2025 •

edited

Loading